Multi-view Automatic Lip-Reading Using Neural Network
نویسندگان
چکیده
It is well known that automatic lip-reading (ALR), also known as visual speech recognition (VSR), enhances the performance of speech recognition in a noisy environment and also has applications itself. However, ALR is a challenging task due to various lip shapes and ambiguity of visemes (the basic unit of visual speech information). In this paper, we tackle ALR as a classification task using end-to-end neural network based on convolutional neural network and long short-term memory architecture. We conduct single, cross, and multi-view experiments in speaker independent setting with various network configuration to integrate the multi-view data. We achieve 77.9%, 83.8%, and 78.6% classification accuracies in average on single, cross, and multi-view respectively. This result is better than the best score (76%) of preliminary single-view results given by ACCV 2016 workshop on multi-view lip-reading/audiovisual challenges. It also shows that additional view information helps to improve the performance of ALR with neural network architecture.
منابع مشابه
Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملResource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion
Today’s Automatic Speech Recognition systems only rely on acoustic signals and often don’t perform well under noisy conditions. Performing multi-modal speech recognition processing acoustic speech signals and lip-reading video simultaneously significantly enhances the performance of such systems, especially in noisy environments. This work presents the design of such an audio-visual system for ...
متن کاملA multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images
The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...
متن کاملLip Reading in Profile
There has been a quantum leap in the performance of automated lip reading recently due to the application of neural network sequence models trained on a very large corpus of aligned text and face videos. However, this advance has only been demonstrated for frontal or near frontal faces, and so the question remains: can lips be read in profile to the same standard? The objective of this paper is...
متن کاملAutomatic Lip Reading for Daily Indonesian Words Based on Frame Difference and Horizontal-vertical Image Projection
Automatic lip reading is one of research being developed lately. Automatic lip reading has been used for various purposes, such as enhancing speech recognition and aid to speech training for the deaf. There are two approaches in lip feature extraction, namely appearance based and shape based. Appearance based approach is usually better, because it provides visual features that cover not only li...
متن کامل